439 research outputs found

    Using Property-Based Testing to Generate Feedback for C Programming Exercises

    Get PDF

    SMOTE for regression

    Get PDF
    Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable

    A Benchmark dataset for predictive maintenance

    Full text link
    The paper describes the MetroPT data set, an outcome of a eXplainable Predictive Maintenance (XPM) project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 that aimed to evaluate machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed), we provide a dataset that can be easily used to evaluate online machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models

    Are the States United? An analysis of US hotels’ offers through TripAdvisor’s eyes

    Get PDF
    This empirical data-driven research aims to unveil thought-provoking insights on the U.S. hotel offer across its 50 states. Information of more than 30,000 hotels was collected through web scraping from TripAdvisor. Using such data, 50 support vector machine models were trained to model the TripAdvisor score, one per state, to assess the convergent and divergent factors in customer satisfaction across all the U.S. states. A conceptual model is proposed and validated through the data-driven support vector machine models developed for each state to identify convergent features across the states to explain customer satisfaction (here represented by TripAdvisor score). Hotel size, price, and stars are not moderated by the location, expressed by the corresponding state, although these highly influence satisfaction, whereas both hotel number of published photos and the amenities are affected by the location. Thus, adaptation issues were found regarding amenities and published photos within each state’s offer.info:eu-repo/semantics/acceptedVersio

    Leveraging national tourist offices through data analytics

    Get PDF
    Purpose This study aims to propose a data-driven approach, based on open-source tools, that makes it possible to understand customer satisfaction of the accommodation offer of a whole country. Design/methodology/approach The method starts by extracting information from all hotels of Portugal available at TripAdvisor through Web scraping. Then, a support vector machine is adopted for modeling the TripAdvisor score, which is considered a proxy of customer satisfaction. Finally, knowledge extraction from the model is achieved using sensitivity analysis to unveil the influence of features on the score. Findings The model of the TripAdvisor score achieved a mean absolute percentage error of around 5 per cent, proving the value of modeling the extracted data. The number of rooms of the unit and the minimum price are the two most relevant features, showing that customers appreciate smaller and more expensive units, whereas the location of the hotel does not hold significant relevance. Originality/value National tourist offices can use the proposed approach to understand what drives tourists’ satisfaction, helping to shape a country’s strategy. For example, licensing new hotels may take into account the unit size and other characteristics that make it more attractive to tourists. Furthermore, the procedure can be replicated at any time and in any country, making it a valuable tool for data-driven decision support on a national scale.info:eu-repo/semantics/acceptedVersio

    Characterization of two-year progression of neurodegeneration in different risk phenotypes of diabetic retinopathy

    Get PDF
    To characterize the two-year progression of neurodegeneration in different diabetic retinopathy (DR) risk phenotypes in type 2 diabetes.info:eu-repo/semantics/publishedVersio
    corecore